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Abstract 

MAXENT is now a common species distribution modeling (SDM) tool used by conservation practitioners for predicting the 
distribution of a species from a set of records and environmental predictors. However, datasets of species occurrence used 
to train the model are often biased in the geographical space because of unequal sampling effort across the study area. This 
bias may be a source of strong inaccuracy in the resulting model and could lead to incorrect predictions. Although a 
number of sampling bias correction methods have been proposed, there is no consensual guideline to account for it. We 
compared here the performance of five methods of bias correction on three datasets of species occurrence: one "virtual" 
derived from a land cover map, and two actual datasets for a turtle {Chrysemys picta) and a salamander (Plethodon 
cylindraceus). We subjected these datasets to four types of sampling biases corresponding to potential types of empirical 
biases. We applied five correction methods to the biased samples and compared the outputs of distribution models to 
unbiased datasets to assess the overall correction performance of each method. The results revealed that the ability of 
methods to correct the initial sampling bias varied greatly depending on bias type, bias intensity and species. However, the 
simple systematic sampling of records consistently ranked among the best performing across the range of conditions 
tested, whereas other methods performed more poorly in most cases. The strong effect of initial conditions on correction 
performance highlights the need for further research to develop a step-by-step guideline to account for sampling bias. 
However, this method seems to be the most efficient in correcting sampling bias and should be advised in most cases. 
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Introduction 

A key issue in ecology and conservation biology is to determine 
how species are distributed in space. Since extinction risk is 
associated with range size [1], a significant reduction of a species 
range often determines change in conservation status (see for 
example lUCN criteria [2,3]) and prime conservations actions 
[4,5]. Likewise, protected areas usually focus on biodiversity 
hotspots [6] in order to conserve efficiently as many species as 
possible [7-9]. Therefore, conservationists often need precise 
assessments of species ranges. Beyond simple range description, 
identifying which main factors limit distributions is essential to 
efficiently forecast the benefits of conservation management. In 
order to deal with these questions, several methods of species 
distribution modeling (SDM), also known as ecological niche 
modeling (ENM) [10], have been developed since the 1980s [11]. 

The principle of SDM is to relate known locations of a species 
with the environmental characteristics of these locations in order 
to estimate the response function and contribution of environ- 
mental variables [12], and predict the potential geographical range 



of a species [13]. These models estimate the fundamental 
ecological niche in the environmental space (i.e. species response 
to abiotic environmental factors [14]) and project it onto the 
geographical space to derive the probability of presence for any 
given area or, depending on the method, the likelihood that 
specific environmental conditions are suitable for the target species 
[15]. Distribution models are used by conservation practitioners to 
estimate the most suitable areas for a species and infer probability 
of presence in regions where no systematic surveys are available 
[16]. They can also assess the potential expansion of introduced 
species in newly colonized areas [17,18], estimate the future range 
of a species under climate change [18,19] or assist in reserve 
planning [20]. 

Several statistical models exist to predict the distribution of a 
species [21]. Beyond classical regression methods (Resource 
Selection Function RSF [22,23], Generalized Linear Models 
GLM [24]), algorithmic modeling based on machine learning (for 
example Artificial Neural Networks [25], Maximum Entropy 
MAXENT [26], Classification And Regression Trees CART [27]) 
have become increasingly popular in recent years. Among these. 
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MAXENT has been described as especially efficient to handle 
complex interactions between response and predictor variables 
[15,28], and to be little sensitive to small sample sizes [29]. This, as 
well as its extreme simplicity of use, has made MAXENT the most 
widely used SDM algorithm. In December 2013, 1886 citations of 
the article describing the method [30] were reported in Web of 
Science. 

MAXENT modeling, and SDM in general, is now commonly 
implemented in conservation-oriented studies [31]. Regional or 
continent-wide studies are facilitated by the recent availabiht}' of 
global datasets. Environmental layers, such as the global climate 
variables developed in the WorldClim project [32], offer 
continuous description of very large areas [33]. Similarly, the 
development of open biodiversity databases (see for example the 
Global Biodiversity Information Facility, GBIF, http://www.gbif 
org) increases manifolds the spatial coverage of fieldwork 
observations that could have been collected by a single project. 
Such databases usually j)ro\ide presence-only data that can be 
handled by modeling methods like MAXENT. 

However, datasets derived from opportunistic observations or 
museum records rather than from planned surveys often exhibit a 
strong geographic bias [34], some areas being visited more often 
than others because of their accessibility [35] or their naturalistic 
interest. This unequal survey coverage of a species distribution is 
often referred as sampling bias, sample selection bias or survey 
bias. The quality of the model can be strongly affected if entire 
parts of the environmental space suitable to a species are absent or 
poorly represented in the survey dataset [36,37], or alternatively, if 
some areas are overrepresented due to locally high sampling 
efforts. Several studies questioned the effect of sampling design 
[38], or the biased nature of museum and herbarium datasets [39] 
on the predictive performance of SDMs. Surprisingly, the issue of 
quantifying and correcting sampling bias has been poorly 
addressed despite its crucial importance. Although authors pointed 
out that the distribution of locations in the geographical and/or 
ecological space may impact the reliability of the model [35,36,40- 
43], the potential effect of the sampling bias in the dataset is 
usually poorly taken into account or not considered at all. 
However, very different SDM outputs can be generated that lead 
to contrasting conclusions whether sampling bias is corrected or 
not [44], making SDM studies that did not incorporate this issue 
highly doubtful. 

In regard to the considerable influence of sampling bias on 
SDM prediction ability, Araiijo et al. [45] considered the 
improvement of sampling designs as one of the five major 
challenges for future development of SDMs. Several bias 
correcting methods have been proposed [46-51] but they have 
been rarely used so far. Comparison and evaluation of different 
methods to correct sampling bias have only been recentiy carried 
out and no consensual guideline emerged to solve it. A few recent 
studies explored the consequences of and potential solutions to 
correct for sampling bias (by Syfert et al. [41], Kramer-Schadt 
[52], Varela et al. [53] and Boria et al. [54]). In spite of their 
interest, authors investigated a single case study so that it is not 
possible to evaluate the efficiency of a correction method across 
species. Second, the empirical bias caused by samphng intensity 
[41] was never tested and no more than two correction techniques 
have been evaluated at the same time whereas many more have 
been proposed or used in the literature [47,49,55]. Therefore, the 
influence of the nature and intensity of bias on the capacity of 
various techniques to correct for sampling bias has not been 
investigated. This remains however a critical issue, especially for 
users who need robust and reliable SDM predictions such as 
conservation practitioners. 



The goal of this comparative study is to test the effect of bias 
t\pe, bias intensity, and correction method on MAXENT model 
performance. Unlike the previous cited studies [41,52-54] we 
assessed the performance of five bias correcting methods among 
the most frequently used under various conditions of bias type and 
intensity. We used a virtual species to generate four types of 
sampling biases and three bias intensities, and applied on these 
biased datasets different corrections. We quantified the relative 
correction performance across the range of bias conditions and 
across species. The same framework was also applied on two real 
datasets. The full workflow on which analyses were based is 
sketched in Figure 1 . Therefore, the present study provides for the 
first time a comprehensive multi-species evaluation of the most 
common methods of sampfing bias correction under different 
scenarios of bias and intensities of bias. Intended for conserva- 
tionists who use MAXENT on a regular basis, we expect this work 
to provide insights on the selection of the most suitable methods to 
produce reliable distribution models using biased datasets. 
Furthermore, it encourages modelers to develop improvements 
of techniques to correct for sampling bias suitable for the vast set of 
modeling methods avaflable. 

Materials and Methods 

Species datasets 

In order to obtain a true unbiased dataset, we created a virtual 
species by randomly sampling a set of points in a given 
environment determined by a single categorical variable. A similar 
approach has been used formerly [52]. We extracted 2000 random 
coordinates from the "Closed to open shrubland" category of the 
North American Globcover map (Globcover 2009, http://due. 
esrin.esa.int/globcover [56]) to generate an unbiased dataset. The 
geographical extent of the virtual species was chosen to match the 
scale of the two real species ranges which are also both located in 
North America. In practice, the virtual species i o\ (Tcd the largc-r 
part of western North America, including the lower valleys of the 
Rocky Mountains, Mojave Desert, Baja California peninsula, and 
Northern Mexico (Figure 2). 

We additionally used the occurrence datasets of two real species 
(Figure 2). We compiled a set of 1825 occurrences for the painted 
turtie {Chiysemys picta) downloaded from the World Turtle 
Database (http://emys.geo.orst.edu; accessed on May 2011). 
The second species was the white-spotted sKmy salamander 
{Pkthodon cylindraceus) for which we coUected a set of 208 
observations. A part of the dataset was provided by J. MUanovich 
and additional records were obtained from the Global Biodiversity 
Information Facility (http:/ /w^vw.gbif org, acc(;ssed on May 
2011). According to our knowledge of the distribution of these 
species, these original datasets seem relatively unbiased, i.e. the 
distribution of records over space reflects the known spatial 
distribution of the species. 

Environmental predictors 

We used distinct sets of environmental predictors depending on 
the modeled species. Climatic and topographic grids were 
downloaded from the WorldClim database [32] (http:/ /www. 
worldclim.org) at a resolution of 2.5 arc-min (4.63 km at the 
ecjuator). The global map of land cover provided by the European 
Spatial Agency was downloaded in its 2009 version (Globcover 
2009 [56], http://due.esrin.esa.int/globcover) and rescaled to fit 
the 2.5 arc-min resolution of the other variables. Finally, we 
compiled 5 years of 10-day periods of NDVI (2007-2011), a 
measure of vegetation productivity derived from multispectral 
remote-sensing images, downloaded from the SPOT-VEGETA- 
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Figure 1. Workflow used in analyses. Original datasets of a virtual and 2 real species were altered to create 12 bias combining 4 bias types and 3 
bias intensities. Five methods of sampling bias correction were employed to assess the improvement in the modeled distribution relative to the 
original distribution using MAXENT. Correction performance was assessed using AUC and 3 measures of overlap between the corrected the original 
unbiased model. 

doi:1 0.1 371/journal.pone.00971 22.g001 



TION project [57] (http:/ /free.vgt.vito.be). We averaged across 
these 5 years three layers of mean, minimum and maximum 
annual NDVI. 

We removed for each species some highly intercorrelated 
(correlation coefficient computed by ArcGIS 10; >0.9 or < — 0.9) 
variables because multicoUinearity may violate statistical assump- 
tions and may alter model predictions [58]. The resulting variable 
sets were composed of 14 predictors (Table 1). Since the 
geographical distribution of the virtual species and Chrysemys picta 
records covered a large range in North America, we modeled both 
species across the same geographic area across North America. 
Plethodon cjlindracms occurrences are restricted to a smaller area of 
Eastern USA. Accordingly, the geographical range of predictors 
was restricted to a narrower area (Table 1). 

Generation of sampling bias 

The three original datasets were altered to generate four types of 
bias that might occur when collecting observations (Figure 3). The 
original datasets were thus subsampled so that the remaining 
records were biased in the geographical space. We also created 
three levels of bias intensity, hereafter referred as "low", 
"medium" and "high" to assess the effect of this parameter on 
model outputs (Figure 3). For each species, each combination of 
bias type (4) and bias intensity (3) was replicated 10 times resulting 
in a total of 360 biased datasets used to model distribution. The 
four types of sampling bias were generated as follows: 

(1) TWO AREAS - The original dataset was biased such that its 
northern part exhibited a high density of records and the southern 
part a low density. This kind of bias is common when a species is 



systematically monitored in one part of its range and not surveyed 
in the other, for instance in different countries or groups of 
countries [44]. 

(2) GRADIENT - We generated a density gradient of 
observations decreasing from the north to the south of the range. 
This bias is close to the first one but here record density changed 
gradually. Such a bias would not reflect a difference in survey 
schemes between administrative divisions but a gradual reduction 
of sampling intensity towards a limit of species range. 

(3) CENTER - The density of occurrences gradually decreased 
from the core of the distribution to the periphery. Such bias 
mimics cases in which sampling effort is concentrated in the centre 
of the known range of the species, whereas peripheral areas, 
potentially less suitable [59], are neglected. 

(4) TRAVEL TIME - We used the travel time to the nearest 
city, using a map produced by the European Commission [60] 
(available at bioval.jrc.ec.europa.eu/products/gam). This variable 
integrates both the distance to the city and the presence of road 
networks. This map was used as a grid of sampling probability, in 
which probability of keeping a record was highest close to cities 
and in areas with dense road networks. This bias corresponds to a 
common situation where most of records are located around cities 
or along roads [35,61]. 

The full details of the generation of sampling bias are given in 
Supporting information. Material SI. 

Species distribution modeling 

We used for modeling the software MAXENT [30] , a machine 
learning algorithm that applies the principle of maximum entropy 
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Figure 2. Locations of records used for modeling. (A) Virtual 
species; (B) Chrysemys picta; (C) Plethodon cylindraceus 
doi:l 0.1 371/journal.pone.0097122.g002 

to j)rcdi(:t the potential distribution of species from pres(;n( (>only 
data and environmental variables [26]. Currentiy, this widely used 
method is particularly efficient to handle complex interactions 
between response and predictor variables [15,28], and is little 
sensitive to small sample sizes [29]. All models were computed 
using the version 3.3.3k of MAXENT (http://www.cs.princeton. 
edu/~schapire/maxent/). Runs were conducted with the default 



variable responses settings, and a logistic output format which 
results in a map of habitat suitabiUty of the species ranging from 0 
to 1 per grid cell, wherein the average observation should be close 
to 0.5 [15]. The models were evaluated by the area under the 
ROC curve (AUG), and three measures of overlap with the 
unbiased model (see below section "Model evaluation and 
statistical analyses"). 

Methods of sampling bias correction 

We apphed on all our biased datasets five methods of bias 
correction that have been Eilready published. In order to evaluate 
their usefulness in real conditions, we used these methods as if the 
source, shape or strength of the sampling bias was unknown. 
Therefore, we did not select a correction method according to our 
knowledge of the bias, as this information is unknown in most 
empirical studies. 

(1) Systematic Sampling. A subsample of records regularly 
distributed in the geographical space was selected [46,54,55,62]. 
MAXENT already discards redundant records that occur in a 
single cell. We removed neighboring occurrences at a coarser 
resolution than MAXENT does. We created a grid of a defined 
cell size and randomly sampled one occurrence per grid cell. This 
subsampling reduces the spatial aggregation of records but does 
not correct the lack of data due to low sampling effort in some 
areas. This method could also underestimate the contribution of 
suitable areas where the high density or records reflects the true 
ecological value for the species. The resolution of the reference 
grid was 2 degrees for Chrysemys picta and the virtual species, and 
0.2 degree for Pkthodon cylindraceus. 

(2) Bias File. This option is implemented in MAXENT. The 
software can be fed with a bias grid [49,63] that is a sampling 
probability surface. The cell values reflect sampling effort and give 
a weight to random background data used for modeling. An ideal 
way of creating biasfiles would be to represent the actual sampling 
intensity across the study area. Although it can be roughly 
estimated by the aggregation of occurrences from closely related 
species [48], in most real modeling situations, this information is 
lacking. Thus, instead of using our knowledge of the artificially 
created biases, we produced bias grids by deriving a Gaussian 
kernel density map of the occurrence locations, rescaled from 1 to 
20, following Elith et al. [63]. These maps were implemented in 
the biasfile option in MAXENT. 

(3) Restricted Background. MAXENT, as most other 
presence-pseudoabsence methods, generates a "background" or 
"pseudo-absence" sample of points [15]. It has been argued that 
the selection of background points may strongly affect the resulting 
model [64—66]. By default 10000 pseudo-absences are randomly 
selected from the whole rectangular study area. This approach was 
followed for all the other cases, as most SDM studies keep the 
default MAXENT selection of background points. However, 
according to Phillips [47], if occurrences are restricted to a fraction 
of the study area, model performance can be enhanced by drawing 
the background points from this fraction of the area. The 
reliability of predictions should be improved when the model is 
transferred to the rest of the area. Following this recommendation, 
we randomly sampled 10000 pseudo-absences in buffer areas 
around occurrences and used them as background samples in 
MAXENT. Buffer size was a radius of 500 km for the virtual 
species and Chrysemys pitta, and 100 km for Plethodon cylindraceus. 

(4) Cluster. Biased datasets typically lead to spatial autocor- 
relation of records and artificial spatial clusters of observations thus 
violating the assumption of independence [67]. This bias can be 
circumvented by sampling one point per cluster in environmental 
space [53,68,69]. We first performed a principal component 
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Table 1. Environmental predictors included in MAXENT modeling for the virtual species, Chrysmeys picta and Plethodon 
cylindraceus. 







Virtual species 


Chrysemys picta 


Plethodon cylindraceus 


Variable 


rtiiiTuae 


X 


X 


X 


Annual mean temperature 


X 


X 


X 


Mean diurnal range 


X 


X 


X 


Isothermality 


X 


X 


X 


Temperature annual range 


X 


X 


X 


Mean temperature of wettest quarter 


X 


X 


X 


Annual precipitation 


X 


X 


X 


Precipitation seasonality 


X 


X 


X 


Precipitation of warmest quarter 


X 


X 


X 


Precipitation of coldest quarter 


X 


X 


X 


Land cover 


X 


X 


X 


Maximum NDVI 


X 


X 




Mean NDVI 


X 


X 


X 


Minimum NDVI 


X 


X 


X 


Mean temperature of driest quarter 






X 


Extent 


Longitude 


min 


-140.00 


-140.00 


-88.63 


max 


-50.00 


-50.00 


-70.68 


Latitude 


min 


20.00 


20.00 


28.91 


max 


75.00 


75.00 


45.46 



The selected layers are indicated for each species, as well as the extent of modeling (in decimal degrees}. 
doi:l 0.1 371 /journal.pone.00971 22.t001 



analysis (PCA) on the environmental descriptors of occurrences 
using the "ade4" R package [70] in order to define independent 
axes in the environmental space. Then, we ran a cluster analysis 
based on Euclidean distance on PCA axes space. The resulting 
dendrogram was used to define a number of classes corresponding 
to half of the occurrences. One record was randomly sampled per 
class and the models were run on these subsampled datasets. 

(5) Split. When occurrence frequency gready differs between 
two areas because of unequal sampling effort, the area can be split 
in two strata within which coverage probability is more 
homogeneous, and one model be computed for each stratum. 
This method has been used for species occurring over a large 
distribution range and extended environmental gradients 
[44,51,71]. We split our biased datasets in a northern and a 
southern stratum. We combined the model outputs to produce a 
composite model for the entire range, keeping the highest value in 
pixels where strata overlapped. 

Evaluation of correction methods 

To estimate the ability of each correction method to recover the 
information contained in the original unbiased data set, we used 4 
criteria that correspond to the interests of different end users of 
SDMs. We compared the models obtained after applying a bias in 
the dataset to the original model, and after applying a sampling 
bias correction method. The original dataset of the virtual species 
was created to be unbiased. The model computed from it will be 
thus referred as the unbiased model. Although we do not formally 



know the actual bias in the Plethodon and Chrysemys original 
datasets, we will also refer to the models computed from their 
original datasets as unbiased models. We should keep in mind that 
we compare the change in the resulting model, whatever the 
original bias. The models referred as biased were computed after 
applying a sampling bias and the corrected models after applying a 
correction method to the biased dataset. 

(1) AUG. The area under the receiver operating curve (ROC), 
known as the AUC is one of the most common statistics to assess 
model performance. AUC can be interpreted as the probability 
that a presence cell have a higher predicted value than a absence 
cell (or pseudo-absence), both of them being chosen randomly 
[28]. Although the use of AUC for the evaluation of ecological 
models has been criticized, especially when calculating against 
background points rather than true-absences [72,73], it should be 
reliable to compare models generated for a single species in the 
same area and the same predictors. 

The calculation of AUC was performed using the R package 
"PresenceAbsence" [74], using as test points the fraction of the 
original dataset which was excluded to create the biased dataset. 
The metrics was calculated by the comparison between these test 
points and either (i) 1 0000 true absence points sampled outside the 
range, which corresponds here to the true occupancy for the 
virtual species, or (ii) 10000 background points randomly sampled 
in all the study area for the real species. For original models, in the 
absence of true test points, a mean AUC value was computed 
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SDMs, we applied the PCA-env method on 500 points sampled 
from the SDM outputs instead of on the input records. We 
selected these points in the output map using the SDM probability 
of presence as a sampling probability. 

(4) GovER overlap between binary maps. SDM outputs 
are often converted to binary maps that are more tractable for 
conservationists who for instance need to delineate protected 
areas. In such maps, a pixel is considered as either suitable to the 
species or not. We used the non-fixed 10* percentile training 
presence threshold value to generate binary maps as proposed by 
Liu et al. [80] . We then measured the overlap between biased and 
corrected models, and unbiased models. This measure stricdy 
measures the geographical overlap whereas the D index estimates 
the overlap of ecological niches in environmental space {D„^ or 
projected onto the geographical space (-D^J. 

To evaluate the performance of bias correction, we derived new 
indicators from AUG, Z)^„, Dg„„, and G„„„ that quantify the 
improvement of the corrected model to the biased model, 
standardized by the difference between the unbiased and the 
biased models. These indices, named respectively AAUC, AZ)gj„, 
AG,,,,_ and AZ)„„ were calculated as follows: 



AAUC — {AUCcorrecteci ^^C/jiascd) / {^UCufihiascd ^UCt>kised) 
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■ected 



S<^''hiased 



■"""^'biased 



1-A 



S'-'"biased 



1-G, 



"""biased 



Figure 3. Generation of sampling bias for the virtual species. To 

generate artificial sampling bias, the original dataset (here the virtual 
species) was altered into 4 different types of bias (rows), each with 3 
intensities (columns). 
doi:1 0.1 371 /journal.pone.00971 22.g003 

using 5 random splits of the dataset, each subsample being used in 
turn to evaluate the model. 

(2) DoEo overlap in the geographical space. Several 
metrics of niche overlap are available (e.g. D [75], modified 
Hellinger distance / [76] or BC [77]). We used the Schoener's D 
index [75] that has been suggested to be best suited for SDM 
outputs [78]. This statistics considers the probability distributions 
across space of the difference in the probability of presence of two 
species, based on their respective distribution models. Dg„ index is 
ranged from 0 (no overlap) to 1 (complete overlap, identical 
models). 

(3) Dgf^y overlap in the environmental space. We 

estimated niche overlap in the environmental space between the 
unbiased, biased, and corrected models. We used the PCA-env 
approach described by Broennimann [79] to calculate Schoener's 
D index based on the environmental characteristics of two sets of 
occurrences. This approach defines the environmental space by 
the two first axes of a principal component analysis of all the pixels 
of the study area. The niche overlap is calculated from the 
smoothed density of occurrences in the environmental space 
following a kernel density function applied on each dataset. 
Depending on the correction method used, some models can be 
based on the same input dataset {e.g. the biasfile correction uses the 
same records as the biased model). In order to compare our 



AD,, 



D. 



'-'"^'corrected 



-D. 



''"'biased 



i-D,„,, 



These four indices range from -oo to 1, a positive value 
indicating that the model was actually corrected (with 1 
corresponding to perfect correction, i.e. corrected model exacdy 
similar as the unbiased one) whereas a negative value indicates 
that the correction produced a worse model than the biased one. 

Results 

Effect of bias type and intensity on model outputs 

The biased models resulted in a reduction of AUG in all cases 
and a deviation from the unbiased model for all overlap measures 
(Figure 4). The deviation varied largely depending on species and 
bias type though: 0.28-0.91 forZ)g,„, 0.18-0.89 for A„„, 0.03-1 for 
and Go„„. For the virtual species, the bias type yielding the largest 
deviation depended on the performance measures considered: "2 
areas" for the AUG, "travel time" for Dg„, "Center" for Z)j„„ and 
"Gradient" for G„„„ the latter providing the lowest values of 
overlap with the original unbiased models. However, the 
differences between unbiased and biased models remained 
moderate (mean percentage variation: AUG: —1.74%; Z)^„: — 
24.41%; D„„: -24.62%; G„„: -30.07%). All bias types had also 
globally similar effects in terms of deviation from the unbiased 
model (Figure 4). For Chrysemjs picta, all evaluation measures were 
strongly affected by the "center" bias. AUG decreased more than 
5%, and overlaps with the unbiased model ranged from 0.26 to 
0.49. In contrast, the P. cylindmceu.s dataset was overall weakly 
affected by the biases so that the biased models did not lead to 
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noticeable difTerences with the unbiased model. The strong effect 
of the "center" bias was visible in Plethodon cjlindraceus only for the 
overlap of binary maps (G„„,-). This bias in which only the central 
zone is sampled may exclude a large part of the original 
environmental space and lead to very inaccurate SDM outputs. 
Interestingly, the decrease in AUG performance for all bias types 
was more pronounced in C. picta than the two other datasets even 
when the values of overlap were in the same range. 

Relative performance of correction method 

Since we evaluated the performance of correction methods 
using indices with different sets of assumptions, interpretation may 
slightiy differ with the measure considered. However, as we mainly 
aimed at comparing SDM outputs, i.e. maps of habitat suitability, 
we primarily focused our interpretations on Ai)^„„ which truly 
evaluates the overlap between standard SDM maps. Moreover, 
the two measures based on Schoener's D, in geographic (AZ)^,„J 
and in environmental space {ADg„„), were highly correlated and 
outputs of bias correction were qualitatively similar (Supporting 
information. Figure SI). We discuss here results for ADj,„ only. 

Correction performance strongly depended on the species 
(Table 2). Considering the three species together, less than half 
(29%) of all combinations (species x bias type x bias intensity x 
correction method) yielded corrected models (following AD^,„) with 
more accurate predictions than the biased model. For the virtual 
species, and considering AD^„, 57% of corrected models (34 out of 
60 combinations of bias type, bias intensity and correction 
method) were more similar to the model generated with the 
unbiased dataset than the biased model (Table 2). Most cases for 
which no method was able to provide bias correction were 



"center" and "travel time" biases, with medium to high intensities. 
Conversely, only 7% of P. cjlindraceus models were corrected (4 
cases out of 60), while 25% of C. picta models were corrected (15 
cases out of 60) and offered a better result than the biased model. 

Regardless of the species, the bias type, and the metrics 
considered, the restricted background method failed to improve 
the biased models in almost all tested cases. The other methods 
performed better but were ranked differently depending on bias 
type. Systematic sampling performed slightly better and more 
consistently among the competing methods as shown by the 
relative performance of each method across bias types (Figure 5). 
Although systematic sampling was not always ranked first, it 
showed very little deviation from the most performing method and 
performed on average better than the others (for ADg„: mean rank 
Systematic sampling = 2.1 1± 1.08 SD; mcau rank Split =2.53 ±1.08 
SD, mean rank cuisster = 2.61 3± 1.23 SD; mean rank Bia.sfiie 
= 3.31 ±1. 31 SD). In contrast, the restricted background method 
recorded the least correction (mean rank Restricted background 
= 4.44±1.13 SD). 

Overall, and considering only AD^,„, the systematic sampling 
method was able to correct the bias (AD^,„>0) in 33% of the test 
cases. This success rate rose to 66% in the case of the virtual 
species, for which we were able to compare to a true unbiased 
model. However, the biasfile corrected the initial bias in 23% of 
test cases. The cluster and split method were both efficient in 23% 
of cases while only 6% of cases were corrected by the restricted 
background method. 

Interestingly, relative performance between methods was 
consistent across metrics (Figure 5). The restricted background 
method was always the least performing one in terms of AAUC, 
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Figure 5. Rank of each method to correct sampling bias. Mean ranks ± standard-error for the performance of each method to correct 
sampling bias for each species (Chrysemys picta: left, Plethodon cylindraceus: center, virtual species: right), following 3 measures of correction 
performance: AAUC (left), ADg^^ (centre), and AG^j, (right). For each type of bias and bias intensity, the method which results in the most efficient 
correction is set to 1 whereas the least powerful method is set to 5. The plotted values are the mean rank across the 4 types of bias and 3 intensities. 
doi:1 0.1 371 /journal.pone.00971 22.g005 



ADg„ and AG„^„ (but for C. picta and AZ)^,„, for which it is ranked 
4/5). The systematic sampling method was among the most 
performing methods. However, even if systematic sampling was 
overall the most efficient method across species in terms of AZ)„„, it 
was outperformed by the split or biasfde methods in some cases for 
the virtual species, or by the cluster method under some 
combinations of bias x intensity for the two real species 
(Table 2). However, when the systematic sampling was unable to 
resolve the bias, this latter was most often equally poorly corrected 
by any of the methods tested. 

Discussion 

As an unexpected first finding, we noticed that the range of 
AUG values obtained for biased and corrected models remained 
high even for models with the strongest biases. The decrease in 
AUG observed after applying the bias was moderate, less than 2% 
on average, across species and bias type. Moreover, the AUG 
values of the biased models were almost always over 0.8 or 0.9, 
which would classify the models as "good" or "very good" (Araujo 
et al. [81] adapted from Swets [82]). Together with other studies 
[72,73,83] our results highlight that this measure may poorly 
reflect model accuracy. Therefore, studies that focus solely on the 
AUG value should interpret their results with caution. AUG may 
be a good statistical measure of discrimination ability, but it often 
fails to quantify the ecological realism of modeled distribution 
[72,73,83] especially when estimated from presence-only data. 
Because we have a reference model, we will mainly focus on the 
overlap indices with the unbiased model as a measure of predictive 
accuracy performance. 

Gontrary to previous studies investigating sampling bias 
correction in SDM that focused on a few methods and simple 
biases [41,52-54], we reviewed here five different ways to deal 
with sampling bias and used both real and virtual datasets under 
various bias scenarios. We also considered bias intensity that has 
been to our knowledge never assessed and proved to be of as a 
high concern as the type of bias. Moreover, instead of relying only 



on classical measures of SDM performance such as AUG (as used 
in Syfert et al. [41], Varela et al. [53] and Boria et al; [54]) or 
omission/commission error (as used in Kramer-Schadt et al. [52]), 
we evaluated the correction performance by directly comparing 
the SDM outputs. Therefore, we actually assessed the ability of the 
tested methods to recover the unbiased model, which is the 
expected behavior of an efficient sampling bias correction. In 
addition, rather than basing our conclusions on island species 
[41,52,54], we used continental species whose distributions are 
clearly shaped by climate and not by a geographically bound 
space. 

Our results clearly evidence that the different methods of 
sampling bias correction tested here may have very variable 
efficiency depending on the modehng conditions (biases type and 
correction method). Interestingly, the correction may have a 
positive effect, and actually contributes to correct the bias; 
nonetheless, in some cases it may produce a poorest model than 
the biased model. These results suggest that the problem of 
sampling bias in species distribution modeling has probably 
multiple answers depending on the context. We especially 
emphasize that the type and intensity of bias influence the ability 
of various methods to resolve the initial bias. 

However, correction methods did not perform equally across 
the various conditions. The less eflicient method restricted the 
spatial extent of the background whereas in other methods, the 
background points were selected from the whole available 
environment {i.e. randomly drawn from the area covered by the 
environmental grid files). Surprisingly, this method is often used 
and have been contributed to improve SDM performance in some 
cases [47,48]. However, as suggested by ThuiUer et al. [84] and 
Vanderwal et al. [85], excessively restricting the geographical 
extent of pseudo-absences to a narrow area or selecting them from 
a too large area reduces model accuracy. Background selection 
may greatly influence the resulting model as it determines the 
underlying assumptions of the model to use [64] . Therefore, this 
step should be undertaken with caution. The size of the buffer used 
for background selection also gready influences model perfor- 
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mance. For instance, AUG often increases with the size of the 
study area because it contributes to include background points that 
have environmental characteristics gready distant from the species 
requirement, resulting in artificial increase of SDM validation 
[65] . The selection of the training area should therefore be stricdy 
relevant to the ecology of the species and the objective of the study. 
A relevant selection of the training area (the geographic region in 
which background points are selected) should reflect the 
geographical space accessible to the species over a given time 
period [65]. It may thus be essential to carry out a rigorous 
investigation of the optimal geographic distance between the set of 
occurrences used to train the model and background points. It has 
to be both optimal for model training and biologically meaningful. 
The interpretation of the modeled distribution must also be 
engaged carefully as it may reflect the fundamental niche or the 
true occupied range, and often a position between both. 

Regarding the high variability in correction performance of the 
different methods depending on various factors, it is difficult to 
propose a universal guideline to solving sampfing bias. It might be 
advisable to evaluate first several types of correction. The final 
choice of correction method would be then based on their effect in 
classical SDM evaluation metrics (e.g. AUG, Kappa, True SkiU 
Statistics TSS) and possibly the adequacy of output maps to a priori 
knowledge of the species distribution. A first useful step might also 
be to evaluate bias type to design and select the most appropriate 
correction. For instance, the split method makes sense only if the 
most and less sampled areas are at least roughly known. Most of 
the time, sampling bias is only inferred by the known sampling 
effort or empirical knowledge of the species distribution. The true 
severity and shape of this bias is almost always lacking. However, 
in some cases, bias can be evaluated by comparing the geographic 
distribution of the available occurrences to known sources of bias. 
An estimation of sampling probabilities across the study area can 
provide insights on the potential bias that may affect the collected 
observations and help in further choice of the correction method. 
The known characteristics of the modeled species may also 
condition the strategy to use. A species with an expected very large 
geographical and/or environmental range should benefit from 
being split into two or more partitions that are combined 
afterward [51]. Finally, the five methods tested here are not 
necessarily exclusive. For instance, it is possible to split a large 
dataset and apply systematic sampling to each datasct in spc( i(;s 
with broad distributions [44] or to apply the biasfile method after 
using first another correction method 

Nevertheless, we found that only systematic sampling constantiy 
performed well irrespective of species and bias type. Interestingly 
enough, it happened to be the simplest and most obvious way to 
solve the geographic bias. Beside, this method can be quickly 
applied to any dataset, even if the nature of the bias is unknown. 
Kramer-Schadt et al. [52] and Boria et al. [54] also identified this 
technique, which they named spatial filtering, as the most effective 
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